Advancing Connectionist Temporal Classification With Attention Modeling
نویسندگان
چکیده
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.
منابع مشابه
Computational modeling of dynamic decision making using connectionist networks
In this research connectionist modeling of decision making has been presented. Important areas for decision making in the brain are thalamus, prefrontal cortex and Amygdala. Connectionist modeling with 3 parts representative for these 3 areas is made based the result of Iowa Gambling Task. In many researches Iowa Gambling Task is used to study emotional decision making. In these kind of decisio...
متن کاملEnd-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
End-to-End speech recognition is a recently proposed approach that directly transcribes input speech to text using a single model. End-to-End speech recognition methods including Connectionist Temporal Classification and Attention-based Encoder Decoder Networks have been shown to obtain state-ofthe-art performance on a number of tasks and significantly simplify the modeling, training and decodi...
متن کاملNasal Speech Sounds Detection Using Connectionist Temporal Classification
Phone attributes, known also as distinctive or phonological features, belong to important classification of the speech sounds used in automatic speech processing. Training of conventional phone attribute detectors (classifiers), either based on acoustic measurements or deep learning approaches, requires decent phone boundary segmentation. This paper proposes a solution to train a phone attribut...
متن کاملIntegrated Gene Expression Analysis of Multiple Microarray Data Sets Based on a Normalization Technique and on Adaptive Connectionist Model
Research with microarray gene expression analysis has primarily been on expression profiling based on one set of microarray data. This paper presents a novel approach to integrated analysis and modeling of microarray data from multiple sources. Normalization method is applied to different data sets before they are used together in an adaptive connectionist classification system. The method is d...
متن کاملThe simultaneous type, serial token model of temporal attention and working memory.
A detailed description of the simultaneous type, serial token (ST2) model is presented. ST2 is a model of temporal attention and working memory that encapsulates 5 principles: (a) M. M. Chun and M. C. Potter's (1995) 2-stage model, (b) a Stage 1 salience filter, (c) N. G. Kanwisher's (1987, 1991) types-tokens distinction, (d) a transient attentional enhancement, and (e) a mechanism for associat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018